Exchange 2010 \ Samsung Mobile Device Bug

I wanted to reach out to this group to determine if we can collectively put pressure on Microsoft to resolve a bug with Samsung mobile devices and Exchange 2010.

Over the last few months, we've been working a Microsoft Exchange 2010 case targeting a large spike in ActiveSync connections, specifically from Android \ Samsung devices running version 5.x or later.  These devices are sending excessive Ping commands to our Exchange servers due to a bug with how Exchange and Samsung maintains keep alive sessions with ActiveSync.   With 44,000 mailboxes, the impact on our network and servers has been noticed.

In a nut shell, whats occurring is a Samsung device issues a ping request to Exchange, Exchange 2010 sends a Null valued response the device cant interpret, which in turn causes the Samsung device to repetitively resend the ping command.   The resulting loop causes a saturation of the network, generates tons of IIS logs, and eats up CPU\RAM on the CAS servers due to the 300,000+ ping requests per day \ per device that are processed.  

Theres good news and bad news with this Microsoft case.  The good news is Microsoft has identified and tagged this an official bug in how Exchange handles or sends null values to mobile clients.  Microsoft has also reported they addressed this bug in Exchange 2013.  The bad news is Exchange 2010 is no longer covered by mainstream bug support, as it ended on January 13<sup>th</sup>, 2015.   In order to get a hotfix issued, you must have an Extended Service Agreement with Microsoft, which is not covered by our Microsoft Select or Software Assurance agreement.   More importantly, the bug hotfix will only be issued to the organization with the ESA and is typically not released to the general public.    Likewise, enrolling in an ESA ($50K) is not cost effective as it only allows one bug case and costs $30K for any additional cases.  This also provides little help for us, as we only plan on being on Exchange 2010 until May of 2016.

Why this is a growing issue:  When we first opened the case (April, 2015) we noticed just over 150 mobile devices that had over 10,000 connections per day.  As of today, this number has grown to over 550 devices, 300 of which have over 50,000 connections per day.    To put it into perspective, over a 24 hour period thats over 37.6 million connections from less than 5% of our total email clients in use and over 55% of ALL client traffic, for all device types (including OWA, EWS, IMAP, POP, ActiveSync and Outlook).    In terms of server impact, we went from generating roughly 25 GB to over 100 GB of IIS logs per day, with this number increasing daily.   CPU and RAM utilization has also increased.  As users continue to upgrade their mobile devices to newer Samsung devices, we believe the logs and connection counts will continue to grow in perpetuity.   Moreover, we do a lot of log analysis so this impacts our reporting processes greatly, as well.

Heres a Sample of Connections by device we are seeing:

DeviceType

User-Agent

Hits

Ping

Sync

FolderSync

SAMSUNGSMG900A

SAMSUNG-SM-G900A/101.40404

319077

316528

2210

232

SAMSUNGSGHI337

SAMSUNG-SGH-I337/101.40404

245599

245269

254

75

SAMSUNGSMT800

Android-SAMSUNG-SM-T800/101.50002

224132

224004

60

67

SAMSUNGSMT800

Android-SAMSUNG-SM-T800/101.50002

223904

223738

95

69

SAMSUNGSMG900V

Android-SAMSUNG-SM-G900V/101.500

218398

218154

164

78

SAMSUNGSMG900V

Android-SAMSUNG-SM-G900V/101.500

209082

206348

2317

410

SAMSUNGSMG900A

Android-SAMSUNG-SM-G900A/101.500

206464

206207

171

82

SAMSUNGSMG900V

Android-SAMSUNG-SM-G900V/101.500

204597

204449

78

69

What you can do to help:  Check your server logs, verify your Exchange environment is impacted too and open a case.   As of this writing, Microsoft has seen several cases opened by other customers exhibiting this same behavior.  However, per our TAM and the US Public Sector Support Engineer Manager, the number has not reached a critical mass where the product development cost of an out-of-band hotfix could be justified.   To gain momentum on a fix, they need more cases.  My hunch is if customers knew where to look and how to analyze their logs, this issue would be more easily identified.   If additional cases are opened, we should be able to place more pressure on the Microsoft product team to fix the bug.  Since the issue has already been bugged, it will also be a non-decrement case, just in case you were wondering.   (We had right at 29 hours invested into the case that were credited back)

How to analyze your Exchange CAS - IIS Logs for the Samsung connection \ ping bug:

  1. Download and install Microsofts Log Parser on your scripting server - https://www.microsoft.com/en-us/download/details.aspx?id=24659
  2. On your scripting server, create the following folder structure:
    • C:\Scripts
    • C:\Scripts\IISLogs
    • C:\Scripts\IISLogs\EX-CAS1        (or the name of your servers, etc)
    • C:\Scripts\IISLogs\EX-CAS2     
    • C:\Scripts\IISLogs\Total
  3. Copy the IIS Logs off all of your CAS servers for the last 24 hours.   Swap out the EX-CAS1 name with the name of your servers.   We need all logs, so you dont want to overwrite your IIS logs from each server, so rename them by server.   The more logs you have, the more accurate results you will get.  Heres an example to copy logs over using the command line or a batch file:

ren C:\Scripts\IISLogs\EX-CAS1\*.log EX-CAS1-*.log

move C:\Scripts\IISLogs\EX-CAS1\*.log C:\Scripts\IISLogs\Total

ren C:\Scripts\IISLogs\EX-CAS2\*.log EX-CAS2-*.log

move C:\Scripts\IISLogs\EX-CAS2\*.log C:\Scripts\IISLogs\Total

ren C:\Scripts\IISLogs\EX-CAS3\*.log EX-CAS3-*.log

move C:\Scripts\IISLogs\EX-CAS3\*.log C:\Scripts\IISLogs\Total

  1. Once the logs are copied to a single location for all CAS servers, open a Command Prompt and switch to the C:\Program Files (x86)\Log Parser 2.2 directory.  

Run the following log parser query.  This query is essentially a SQL Select statement to count all Samsung device hits and dumps the results to an Excel CSV file.

Logparser -i:iisw3c "SELECT TOP 1000 TO_LOWERCASE (cs-username) AS User, MyDeviceId AS DeviceId, MyDeviceType AS DeviceType, cs(User-Agent) AS User-Agent, COUNT(*) AS Hits, SUM (MyPing) AS Ping, SUM (MySync) AS Sync, SUM (MyFolderSync) AS FolderSync, SUM (MySendMail) AS SendMail USING EXTRACT_VALUE(cs-uri-query,'DeviceType') AS MyDeviceType, EXTRACT_VALUE(cs-uri-query,'DeviceId') AS MyDeviceId,  EXTRACT_VALUE(cs-uri-query,'User-Agent') AS MyUser-Agent, EXTRACT_VALUE(cs-uri-query,'Cmd') AS MyCmd,  EXTRACT_VALUE(cs-uri-query,'Log') AS MyLog, SUBSTR(TO_STRING(sc-status),0,1) AS StatusCode, CASE MyCmd WHEN 'Sync' THEN 1 ELSE 0 END AS MySync, CASE MyCmd WHEN 'Ping' THEN 1 ELSE 0 END AS MyPing, CASE MyCmd WHEN 'SendMail' THEN 1 ELSE 0 END AS MySendMail, CASE MyCmd WHEN 'FolderSync' THEN 1 ELSE 0 END AS MyFolderSync INTO 'C:\Scripts\IISLogs\Total\ ActiveSync_Top-1000-Devices-And-Users.csv' FROM 'C:\Scripts\IISLogs\Total\*.log' WHERE cs-uri-stem LIKE '%%/Microsoft-Server-ActiveSync%%' GROUP BY User,DeviceType,DeviceId,User-Agent ORDER BY Hits DESC"

Note: Depending on the amount of log files you have it could take anywhere from 5-10 minutes to run.   We process about 100 GB of logs in about 20 mins.

*** Disclaimer: These instructions are only a guide on how to parse IIS log data.  Use these instructions at your own risk. ***

My apologies for the long write-up, but I thought Id bring this issue up to others who might be experiencing the behavior and needed to reduce load on their servers.    Hopefully as a group we can get Microsoft to resolve this bug once and for all.

Please share this with any other Exchange engineers who might find this of interest.  We can also provide the contact info of the Microsoft - US Public Sector Manager, if needed.

If you have any questions or comments, please let me know.   

Many thanks,

Ed McKinzie

August 18th, 2015 4:42pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics